43 research outputs found

    Solving systems of symmetric Toeplitz tridiagonal equations: Rojo's algorithm revisited

    Full text link
    More than 20 years ago, Rojo published [1] an algorithm for solving linear systems where the matrix is tridiagonal symmetric Toeplitz and diagonal dominant. The technique proposed by Rojo is very efficient, O(n), and has been applied successfully in the solution of other similar problems: circulant tridiagonal systems, pentadiagonal Toeplitz systems, etc. In this article we extend Rojo's algorithm to the case of non-diagonal dominant matrices, thus completing a good tool in the aforementioned applications. Other algorithms that solve the same problem are also analysed and compared with the new version of Rojo's algorithm. © 2012 Elsevier Inc. All rights reserved.Supported by Spanish Government (Projects TIN2008-06570-C04 and TEC2009-13741), and Generalitat Valenciana (Project PROMETEO/2009/013).Vidal Maciá, AM.; Alonso-Jordá, P. (2012). Solving systems of symmetric Toeplitz tridiagonal equations: Rojo's algorithm revisited. Applied Mathematics and Computation. 219(4):1874-1889. https://doi.org/10.1016/j.amc.2012.08.03018741889219

    Block pivoting implementation of a symmetric Toeplitz solver

    Full text link
    Toeplitz matrices are characterized by a special structure that can be exploited in order to obtain fast linear system solvers. These solvers are difficult to parallelize due to their low computational cost and their closely coupled data operations. We propose to transform the Toeplitz system matrix into a Cauchy-like matrix since the latter can be divided into two independent matrices of half the size of the system matrix and each one of these smaller arising matrices can be factorized efficiently in multicore computers. We use OpenMP and store data in memory by blocks in consecutive positions yielding a simple and efficient algorithm. In addition, by exploiting the fact that diagonal pivoting does not destroy the special structure of Cauchy-like matrices, we introduce a local diagonal pivoting technique which improves the accuracy of the solution and the stability of the algorithm.This work was partially supported by the Spanish Ministerio de Ciencia e Innovacion (Project TIN2008-06570-C04-02 and TEC2009-13741), Vicerrectorado de Investigacion de la Universidad Politecnica de Valencia through PAID-05-10 (ref. 2705), and Generalitat Valenciana through project PROMETEO/2009/2013.Alonso-Jordá, P.; Dolz Zaragozá, MF.; Vidal Maciá, AM. (2014). Block pivoting implementation of a symmetric Toeplitz solver. Journal of Parallel and Distributed Computing. 74(5):2392-2399. https://doi.org/10.1016/j.jpdc.2014.02.003S2392239974

    Improving the performance of water distribution systems’ simulation on multicore systems

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-015-1607-5Hydraulic solvers for the simulation of flows and pressures in water distribution systems (WDS) are used extensively, and their computational performance is key when considering optimization problems. This paper presents an approach to speedup the hydraulic solver using OpenMP with two efficient methods for WDS simulation. The paper identifies the different tasks carried out in the simulation, showing their contribution to the execution time, and selecting the target tasks for parallelization. After describing the algorithms for the selected tasks, parallel OpenMP versions are derived, with emphasis on the task of linear system update. Results are presented for four different large WDS models, showing considerable reduction in computing timeThis work has been partially supported by Ministerio de Economia y Competitividad from Spain, under the project TEC2012-38142-C04-01, and by project PROMETEO FASE II 2014/003 of Generalitat Valenciana.Alvarruiz Bermejo, F.; Martínez Alzamora, F.; Vidal Maciá, AM. (2016). Improving the performance of water distribution systems’ simulation on multicore systems. Journal of Supercomputing. 1-13. https://doi.org/10.1007/s11227-015-1607-5S113Abraham E, Stoianov I (2015) Efficient preconditioned iterative methods for hydraulic simulation of large scale water distribution networks. Proc Eng 119:623–632Abraham E, Stoianov I (2015) Sparse null space algorithms for hydraulic analysis of large-scale water supply networks. J Hydraul Eng. doi: 10.1061/(ASCE)HY.1943-7900.0001089Alonso JM, Alvarruiz F, Guerrero D et al (2000) Parallel computing in water network analysis and leakage minimization. J Water Resour Plan Manag 126(4):251–260Alvarruiz F, Martínez-Alzamora F, Vidal AM (2015) Efficient simulation of water distribution systems using openmp. In: 15th International conference computational and mathematical methods in computational mathematics, science and engineering, pp 125–129Alvarruiz F, Martínez-Alzamora F, Vidal AM (2015) Improving the efficiency of the loop method for the simulation of water distribution systems. J Water Resour Plan Manag 141(10):04015019Burger G, Sitzenfrei R, Kleidorfer M, Rauch W (2015) Quest for a new solver for EPANET 2. J Water Resour Plan Manag. doi: 10.1061/(ASCE)WR.1943-5452.0000596Creaco E, Franchini M (2014) Comparison of Newton–Raphson global and loop algorithms for water distribution network resolution. J Hydraul Eng 140(3):313–321Creaco E, Franchini M (2015) The identification of loops in water distribution networks. Proc Eng 119:506–515 Computing and Control for the Water Industry (CCWI2015) Sharing the best practice in water managementCrous PA, van Zyl JE, Roodt Y (2012) The potential of graphical processing units to solve hydraulic network equations. J Hydroinf 14:603–612Elhay S, Simpson A, Deuerlein J, Alexander B, Schilders W (2014) Reformulated co-tree flows method competitive with the global gradient algorithm for solving water distribution system equations. J Water Resour Plan Manag 140(12):04014040Epp R, Fowler AG (1970) Efficient code for steady-state flows in networks. J Hydraul Div 96(1):43–56Guidolin M, Burovskiy P, Kapelan Z, Savić D (2010) Cwsnet: an object-oriented toolkit for water distribution system simulations. In: Proceedings of 12th water distribution system analysis symposium, ASCE, Reston, VAGuidolin M, Kapelan Z, Savic D (2013) Using high performance techniques to accelerate demand-driven hydraulic solvers. J Hydroinf 15(1):38–54Guidolin M, Kapelan Z, Savic D, Giustolisi O (2010) High performance hydraulic simulations with epanet on graphics processing units. In: Proceedings of 9th international conference on hydroinformaticsOstfeld A, Uber J, Salomons E et al (2008) The battle of the water sensor networks (BWSN): a design challenge for engineers and algorithms. J Water Resour Plan Manag 134(6):556–568Rossman AL (2000) Epanet 2 users manual. Water Supply and Water Resources Division, US Environment Protection AgencyTodini E, Pilati S (1988) Computer applications in water supply: vol. 1—systems analysis and simulation. In: Coulbeck B, Orr CH (eds) A gradient algorithm for the analysis of pipe networks. Research Studies Press Ltd, Letchworth, Hertfordshire, UK, pp 1–2

    Improving the efficiency of the loop method for the simulation of water distribution networks

    Full text link
    Efficiency of hydraulic solvers for the simulation of flows and pressures in water distribution systems (WDSs) is very important, especially in the context of optimization and risk analysis problems, where the hydraulic simulation has to be repeated many times. Among the methods used for hydraulic solvers, the most prominent nowadays is the global gradient algorithm (GGA), based on a hybrid node-loop formulation. Previously, another method based just on loop flow equations was proposed, which presents the advantage that it leads to a system matrix that is in most cases much smaller than in the GGA method, but has also some disadvantages, mainly a less sparse system matrix and the fact that introducing some types of valves requires the redefinition of the set of network loops initially defined. The contribution of this paper is to present solutions for overcoming the mentioned disadvantages of the method based on loop flow equations. In particular, efficient procedures are shown for selecting the network loops so as to achieve a highly sparse matrix and methods are presented to incorporate check valves and automatic control valves while avoiding the need to redefine the loops initially selected. (C) 2015 American Society of Civil Engineers.This work has been partially supported by "Ministerio de Economia y Competitividad" from Spain, under the project TEC2012-38142-C04-01 and by PROMETEO FASE II 2014/003 project of Generalitat Valenciana.Alvarruiz Bermejo, F.; Martínez Alzamora, F.; Vidal Maciá, AM. (2015). Improving the efficiency of the loop method for the simulation of water distribution networks. Journal of Water Resources Planning and Management. 141(10):1-10. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000539S1101411

    Updating/downdating the NonNegative Matrix Factorization

    Full text link
    This is the author’s version of a work that was accepted for publication in Journal of Computational and Applied Mathematics. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal of Computational and Applied Mathematics 318 (2017) 59–68. DOI 10.1016/j.cam.2016.11.048.The Non-Negative Matrix Factorization (NNMF) is a recent numerical tool that, given a nonnegative data matrix, tries to obtain its factorization as the approximate product of two nonnegative matrices. Nowadays, this factorization is being used in many science fields; in some of these fields, real-time computation of the NNMF is required. In some scenarios, all data is not initially available and when new data (as new rows or columns) becomes available the NNMF must be recomputed. Recomputing the whole factorization every time is very costly and not suitable for real time applications. In this paper we propose several algorithms to update the NNMF factorization taking advantage of the previously computed factorizations, with similar error and lower computational cost. © 2016 Elsevier B.V. All rights reserved.This work has been partially supported by EU together with Spanish Government through TEC2015-67387-C4-1-R (MINECO/FEDER), by Generalitat Valenciana through PROMETEOII/2014/003 and by Programa de FPU del Ministerio de Educacion, Cultura y Deporte FPU13/03828 (Spain). We want to thank Dr. Pedro Vera and his team (University of Jaen) for providing us with their music analysis software.San Juan Sebastián, P.; Vidal Maciá, AM.; García Mollá, VM. (2016). Updating/downdating the NonNegative Matrix Factorization. Journal of Computational and Applied Mathematics. 318:59-68. https://doi.org/10.1016/j.cam.2016.11.048S596831

    Efficient Modeling of Active Control Valves in Water Distribution Systems Using the Loop Method

    Full text link
    [EN] This paper presents a novel approach to model pressure- and flow-regulating devices in the context of the Newton-Raphson loop method for water distribution network simulation. The proposed approach uses a symmetric matrix for the underlying linear systems, which enables simpler implementation and faster solution, while producing iterations very close to the global gradient algorithm of EPANET. The structure of the matrix is kept unchanged regardless of the operational status of the valves. The paper presents results that validate its formulation, accuracy, and speed in various case studies.Alvarruiz Bermejo, F.; Martínez Alzamora, F.; Vidal Maciá, AM. (2018). Efficient Modeling of Active Control Valves in Water Distribution Systems Using the Loop Method. Journal of Water Resources Planning and Management. 144(10):1-9. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000982S191441

    A Pipeline for the QR Update in Digital Signal Processing

    Full text link
    [EN] The input and output signals of a digital signal processing system can often be represented by a rectangular matrix as it is the case of the beamformer algorithm, a very useful particular algorithm that allows extraction of the original input signal once it is cleaned from noise and room reverberation. We use a version of this algorithm in which the system matrix must be factorized to solve a least squares problem. The matrix changes periodically according to the input signal sampled; therefore, the factorization needs to be recalculated as fast as possible. In this paper, we propose to use parallelism through a pipeline pattern. With our pipeline, some partial computations are advanced so that the final time required to update the factorization is highly reducedThis work was supported by the Spanish Ministry of Economy and Competitiveness under MINECO and FEDER projects TIN2014-53495-R and TEC2015-67387-C4-1-R.Dolz, MF.; Alventosa, FJ.; Alonso-Jordá, P.; Vidal Maciá, AM. (2019). A Pipeline for the QR Update in Digital Signal Processing. Computational and Mathematical Methods. 1:1-13. https://doi.org/10.1002/cmm4.1022S113

    Multichannel massive audio processing for a generalized crosstalk cancellation and equalization application using GPUs

    Full text link
    [EN] Multichannel acoustic signal processing has undergone major development in recent years due to the increased com- plexity of current audio processing applications, which involves the processing of multiple sources, channels, or filters. A gen- eral scenario that appears in this context is the immersive reproduction of binaural audio without the use of headphones, which requires the use of a crosstalk canceler. However, generalized crosstalk cancellation and equalization (GCCE) requires high com- puting capacity, which is a considerable limitation for real-time applications. This paper discusses the design and implementation of all the processing blocks of a multichannel convolution on a GPU for real-time applications. To this end, a very efficient fil- tering method using specific data structures is proposed, which takes advantage of overlap-save filtering and filter fragmentation. It has been shown that, for a real-time application with 22 inputs and 64 outputs, the system is capable of managing 1408 filters of 2048 coefficients with a latency time less than 6 ms. The proposed GPU implementation can be easily adapted to any acoustic environment, demonstrating the validity of these co-processors for managing intensive multichannel audio applications.This work has been partially funded by Spanish Ministerio de Ciencia e Innovacion TEC2009-13741, Generalitat Valenciana PROMETEO 2009/2013 and GV/2010/027, and Universitat Politecnica de Valencia through Programa de Apoyo a la Investigacion y Desarrollo (PAID-05-11).Belloch Rodríguez, JA.; Gonzalez, A.; Martínez Zaldívar, FJ.; Vidal Maciá, AM. (2013). Multichannel massive audio processing for a generalized crosstalk cancellation and equalization application using GPUs. Integrated Computer-Aided Engineering. 20(2):169-182. https://doi.org/10.3233/ICA-130422S16918220

    Analysis of an efficient parallel implementation of active-set Newton algorithm

    Full text link
    [EN] This paper presents an analysis of an efficient parallel implementation of the active-set Newton algorithm (ASNA), which is used to estimate the nonnegative weights of linear combinations of the atoms in a large-scale dictionary to approximate an observation vector by minimizing the Kullback¿Leibler divergence between the observation vector and the approximation. The performance of ASNA has been proved in previous works against other state-of-the-art methods. The implementations analysed in this paper have been developed in C, using parallel programming techniques to obtain a better performance in multicore architectures than the original MATLAB implementation. Also a hardware analysis is performed to check the influence of CPU frequency and number of CPU cores in the different implementations proposed. The new implementations allow ASNA algorithm to tackle real-time problems due to the execution time reduction obtained.This work has been partially supported by Programa de FPU del MECD, by MINECO and FEDER from Spain, under the projects TEC2015-67387- C4-1-R, and by project PROMETEO FASE II 2014/003 of Generalitat Valenciana. The authors want to thank Dr. Konstantinos Drossos for some very useful mind changing discussions. This work has been conducted in Laboratory of Signal Processing, Tampere University of Technology.San Juan-Sebastian, P.; Virtanen, T.; García Mollá, VM.; Vidal Maciá, AM. (2018). Analysis of an efficient parallel implementation of active-set Newton algorithm. The Journal of Supercomputing. 75(3):1298-1309. https://doi.org/10.1007/s11227-018-2423-5S12981309753Raj B, Smaragdis P (2005) Latent variable decomposition of spectrograms for single channel speaker separation. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2005), New Paltz, NyBertin N, Badeau R, Vincent E (2010) Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans Audio Speech Lang Process 18(3):538–549Dikmen O, Mesaros A (2013) Sound event detection using non-negative dictionaries learned from annotated overlapping events. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2013). New Paltz, NYLawson CL, Hanson RJ (1995) Solving least squares problems. Society for Industrial and Applied Mathematics, PhiladelphiaVirtanen T (2007) Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans Audio Speech Lang Process 15(3):1066–1074Virtanen T, Gemmeke J, Raj B (2013) Active-set Newton algorithm for overcomplete non-negative representations of audio. IEEE Trans Audio Speech Lang Process 21(11):2277–2289Cemgil AT (2009) Bayesian inference for nonnegative matrix factorisation models. Comput Intell Neurosci 2009:785152Cichocki A, Zdunek R, Phan AH, Amari S (2009) Nonnegative matrix and tensor factorizations. Wiley, New YorkMATLAB (2014) The Mathworks Inc., MATLAB R2014B, Natnick MATuomas Virtanen, Original MATLAB implementation of ASNA algorithm. http://www.cs.tut.fi/~tuomasv/software.htmlCarabias-Orti J, Rodriguez-Serrano F, Vera-Candeas P, Canadas-Quesada F, Ruiz-Reyes N (2013) Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription. Eng Appl Artif Intell 26:1671–1680San Juan P, Virtanen T, Garcia-Molla Victor M, Vidal Antonio M (2016) Efficient parallel implementation of active-set newton algorithm for non-negative sparse representations. In: 16th International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE 2016), Rota, SpainJuan P San, Efficient implementations of ASNA algorithm. https://gitlab.com/P.SanJuan/ASNAOpenMP v4.5 specification (2015). http://www.openmp.org/wpcontent/uploads/openmp-4.5.pdfGemmeke JF, Hurmalainen A, Virtanen T, Sun Y (2011) Toward a practical implementation of exemplar-based noise robust ASR. In: Signal Processing Conference, 19th European, IEEE, pp 1490–149

    Parallel SUMIS Soft Detector for Large MIMO Systems on Multicore and GPU

    Get PDF
    [EN] The number of transmit and receiver antennas is an important factor that affects the performance and complexity of a MIMO system. A MIMO system with very large number of antennas is a promising candidate technology for next generations of wireless systems. However, the vast majority of the methods proposed for conventional MIMO system are not suitable for large dimensions. In this context, the use of high-performance computing systems, such us multicore CPUs and graphics processing units has become attractive for efficient implementation of parallel signal processing algorithms with high computational requirements. In the present work, two practical parallel approaches of the Subspace Marginalization with Interference Suppression detector for large MIMO systems have been proposed. Both approaches have been evaluated and compared in terms of performance and complexity with other detectors for different system parameters.This work has been partially supported by the Spanish MINECO Grant RACHEL TEC2013-47141-C4-4-R, the PROMETEO FASE II 2014/003 Project and FPU AP-2012/71274Ramiro Sánchez, C.; Simarro, MA.; Gonzalez, A.; Vidal Maciá, AM. (2019). Parallel SUMIS Soft Detector for Large MIMO Systems on Multicore and GPU. The Journal of Supercomputing. 75(3):1256-1267. https://doi.org/10.1007/s11227-018-2403-9S12561267753Rusek F, Persson D, Lau BK, Larsson EG, Marzetta TL, Edfors O, Tufvesson F (2013) Scaling up MIMO: opportunities and challenges with very large arrays. IEEE Signal Proc Mag 30(1):40–60Studer C, Burg A, Bölcskei H (2008) Soft-output sphere decoding: algorithms and VLSI implementation. IEEE J Sel Areas Commun 26(2):290–300Wang R, Giannakis GB (2004) Approaching MIMO channel capacity with reduced-complexity soft sphere decoding. In: Wireless Communications and Networking Conference, 2004. WCNC. 2004 IEEE vol 3, pp 1620–1625Persson D, Larsson EG (2011) Partial marginalization soft MIMO detection with higher order constellations. IEEE Trans Signal Procces 59(1):453–458Cîrkić M, Larsson EG (2014) SUMIS: near-optimal soft-in soft-out MIMO detection with low and fixed complexity. IEEE Trans Signal Process 62(12):3084–3097Alberto Gonzalez C, Ramiro, M, Ángeles Simarro, Antonio M Vidal (2017) Parallel SUMIS soft detector for MIMO systems on multicore. In: Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering, pp 1729–1736Hochwald BM, ten Brink S (2003) Achieving near-capacity on a multiple-antenna channel. IEEE Trans Commun 51:389–399Kaipeng L, Bei Y, Michael W, Joseph RC, Christoph S (2015) Accelerating massive MIMO uplink detection on GPU for SDR systems. In: 2015 IEEE dallas circuits and systems conference (DCAS), pp 1–4Di W, Eilert J, Liu D (2011) Implementation of a high-speed MIMO soft-output symbol detector for software defined radio. J Signal Process Syst 63(1):27–37Anderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra J, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D (1999) LAPACK users’ guide. SIAM, LondonIntel MKL Reference Manual (2015) https://software.intel.com/en-us/articles/mkl-reference-manualcuBLAS Documentation (2015) http://docs.nvidia.com/cuda/cublasDagum L, Enon R (1998) OpenMP: an industry standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55CUDA Toolkit Documentation, Version 7.5 (2015) https://developer.nvidia.com/cuda-toolkitRoger S, Ramiro C, Gonzalez A, Almenar V, Vidal AM (2012) Fully parallel GPU implementation of a fixed-complexity soft-output MIMO detector. IEEE Trans Veh Technol 61(8):3796–3800Senst M, Ascheid G, Lüders H (2010) Performance evaluation of the markov chain monte carlo MIMO detector based on mutual information. 2010 IEEE International Conference on Communications (ICC), pp 1–
    corecore